Solving Kernel Ridge Regression with Gradient Descent
Oskar Allerbo (Chalmers University of Technology & University of Gothenburg)
Abstract: We present an equivalent formulation for the objective function of kernel ridge regression (KRR), that opens up for studying KRR from the perspective of gradient descent. Utilizing gradient descent with infinitesimal step size, allows us to formulate a new regularization for kernel regression through early stopping.
The gradient descent formulation of KRR allows us expand to a time dependent stationary kernel, where we decrease the bandwidth to zero during training. This circumvents the need of hyper parameter selection. Furthermore, we are able to achieve both zero training error and a double descent behavior, phenomena that do not occur for KRR with constant bandwidth, but are known to appear for neural networks.
The new formulation of KRR also enables us to explore other penalties than the ridge penalty. Specifically, we explore the $\ell_1$ and $\ell_\infty$ penalties and show that these correspond to two flavors of gradient descent, thus alleviating the need of computationally heavy proximal gradient descent algorithms. We show theoretically and empirically how these formulations correspond to signal-driven and robust regression, respectively.
machine learningoptimization and controlstatistics theory
Audience: researchers in the discipline
Series comments: Gothenburg statistics seminar is open to the interested public, everybody is welcome. It usually takes place in MVL14 (http://maps.chalmers.se/#05137ad7-4d34-45e2-9d14-7f970517e2b60, see specific talk). Speakers are asked to prepare material for 35 minutes excluding questions from the audience.
| Organizers: | Akash Sharma*, Helga Kristín Ólafsdóttir* |
| *contact for this listing |
